Boundary Proposal Network for Two-stage Natural Language Video Localization

نویسندگان

چکیده

We aim to address the problem of Natural Language Video Localization (NLVL) — localizing video segment corresponding a natural language description in long and untrimmed video. State-of-the-art NLVL methods are almost one-stage fashion, which can be typically grouped into two categories: 1) anchor-based approach: it first pre-defines series candidates (e.g., by sliding window), then does classification for each candidate; 2) anchor-free directly predicts probabilities frame as boundary or intermediate inside positive segment. However, both kinds approaches have inherent drawbacks: approach is susceptible heuristic rules, further limiting capability handling videos with variant length. While fails exploit segment-level interaction thus achieving inferior results. In this paper, we propose novel Boundary Proposal Network (BPNet), universal two-stage framework that gets rid issues mentioned above. Specifically, stage, BPNet utilizes an model generate group high-quality candidate segments their boundaries. second visual-language fusion layer proposed jointly multi-modal between query, followed matching score rating outputs alignment candidate. evaluate our on three challenging benchmarks (i.e., Charades-STA, TACoS ActivityNet-Captions). Extensive experiments ablative studies these datasets demonstrate outperforms state-of-the-art methods.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Natural language querying for video databases

The video databases have become popular in various areas due to the recent advances in technology. Video archive systems need user-friendly interfaces to retrieve video frames. In this paper, a user interface based on natural language processing (NLP) to a video database system is described. The video database is based on a content-based spatio-temporal video data model. The data model is focus...

متن کامل

An Efficient Adaptive Boundary Matching Algorithm for Video Error Concealment

Sending compressed video data in error-prone environments (like the Internet and wireless networks) might cause data degradation. Error concealment techniques try to conceal the received data in the decoder side. In this paper, an adaptive boundary matching algorithm is presented for recovering the damaged motion vectors (MVs). This algorithm uses an outer boundary matching or directional tempo...

متن کامل

Natural language descriptions for video streams

Digital images and videos collection has increased exponentially in the recent years as more and more data is available in the form of personal photo albums, handheld camera videos, feature films and multilingual broadcast news videos, presenting visual data ranging from unstructured to highly structured. Today video data accounts for 80 percent of all network traffic. There is a need for quali...

متن کامل

Two - Stage Neural Network For

A new system to segment and label CT/MRI brain slices using feature extraction and unsupervised clustering is presented. Each volume element (voxel) is assigned a feature pattern consisting of a scaled family of diierential geometrical invariant features. The invariant feature pattern is then assigned to a speciic region using a two-stage neural network system. The rst stage is a self-organizin...

متن کامل

Thesis Proposal Verb Semantics for Natural Language Understanding

A verb is the organizational core of a sentence. Understanding the meaning of the verb is therefore key to understanding the meaning of the sentence. Natural language understanding is the problem of mapping natural language text to its meaning representation: entities and relations anchored to the world. Since verbs express relations over their arguments in text, a lexical resource about verbs ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2021

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v35i4.16406